Diffusion model, a new generative modelling paradigm, has achieved great success in image, audio, and video generation. However, considering the discrete categorical nature of text, it is not trivial to extend continuous diffusion models to natural language, and text diffusion models are less studied. Sequence-to-sequence text generation is one of the essential natural language processing topics. In this work, we apply diffusion models to approach sequence-to-sequence text generation, and explore whether the superiority generation performance of diffusion model can transfer to natural language domain. We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation. SeqDiffuSeq uses an encoder-decoder Transformers architecture to model denoising function. In order to improve generation quality, SeqDiffuSeq combines the self-conditioning technique and a newly proposed adaptive noise schedule technique. The adaptive noise schedule has the difficulty of denoising evenly distributed across time steps, and considers exclusive noise schedules for tokens at different positional order. Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.
translated by 谷歌翻译
Language models with the Transformers structure have shown great performance in natural language processing. However, there still poses problems when fine-tuning pre-trained language models on downstream tasks, such as over-fitting or representation collapse. In this work, we propose HyPe, a simple yet effective fine-tuning technique to alleviate such problems by perturbing hidden representations of Transformers layers. Unlike previous works that only add noise to inputs or parameters, we argue that the hidden representations of Transformers layers convey more diverse and meaningful language information. Therefore, making the Transformers layers more robust to hidden representation perturbations can further benefit the fine-tuning of PLMs en bloc. We conduct extensive experiments and analyses on GLUE and other natural language inference datasets. Results demonstrate that HyPe outperforms vanilla fine-tuning and enhances generalization of hidden representations from different layers. In addition, HyPe acquires negligible computational overheads, and is better than and compatible with previous state-of-the-art fine-tuning techniques.
translated by 谷歌翻译
Detecting sarcasm and verbal irony from people's subjective statements is crucial to understanding their intended meanings and real sentiments and positions in social scenarios. This paper describes the X-PuDu system that participated in SemEval-2022 Task 6, iSarcasmEval - Intended Sarcasm Detection in English and Arabic, which aims at detecting intended sarcasm in various settings of natural language understanding. Our solution finetunes pre-trained language models, such as ERNIE-M and DeBERTa, under the multilingual settings to recognize the irony from Arabic and English texts. Our system ranked second out of 43, and ninth out of 32 in Task A: one-sentence detection in English and Arabic; fifth out of 22 in Task B: binary multi-label classification in English; first out of 16, and fifth out of 13 in Task C: sentence-pair detection in English and Arabic.
translated by 谷歌翻译
In recent years, interest has arisen in using machine learning to improve the efficiency of automatic medical consultation and enhance patient experience. In this article, we propose two frameworks to support automatic medical consultation, namely doctor-patient dialogue understanding and task-oriented interaction. We create a new large medical dialogue dataset with multi-level finegrained annotations and establish five independent tasks, including named entity recognition, dialogue act classification, symptom label inference, medical report generation and diagnosis-oriented dialogue policy. We report a set of benchmark results for each task, which shows the usability of the dataset and sets a baseline for future studies. Both code and data is available from https://github.com/lemuria-wchen/imcs21.
translated by 谷歌翻译
With increasing scale, large language models demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked language modeling and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update language models further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with large language models, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average performance of OPT (175B) on 9 datasets.
translated by 谷歌翻译
In this paper, we introduce a novel optimization algorithm for machine learning model training called Normalized Stochastic Gradient Descent (NSGD) inspired by Normalized Least Mean Squares (NLMS) from adaptive filtering. When we train a high-complexity model on a large dataset, the learning rate is significantly important as a poor choice of optimizer parameters can lead to divergence. The algorithm updates the new set of network weights using the stochastic gradient but with $\ell_1$ and $\ell_2$-based normalizations on the learning rate parameter similar to the NLMS algorithm. Our main difference from the existing normalization methods is that we do not include the error term in the normalization process. We normalize the update term using the input vector to the neuron. Our experiments present that the model can be trained to a better accuracy level on different initial settings using our optimization algorithm. In this paper, we demonstrate the efficiency of our training algorithm using ResNet-20 and a toy neural network on different benchmark datasets with different initializations. The NSGD improves the accuracy of the ResNet-20 from 91.96\% to 92.20\% on the CIFAR-10 dataset.
translated by 谷歌翻译
狗主人通常能够识别出揭示其狗的主观状态的行为线索,例如疼痛。但是自动识别疼痛状态非常具有挑战性。本文提出了一种基于视频的新型,两流深的神经网络方法,以解决此问题。我们提取和预处理身体关键点,并在视频中计算关键点和RGB表示的功能。我们提出了一种处理自我十分和缺少关键点的方法。我们还提出了一个由兽医专业人员收集的独特基于视频的狗行为数据集,并注释以进行疼痛,并通过建议的方法报告良好的分类结果。这项研究是基于机器学习的狗疼痛状态估计的第一批作品之一。
translated by 谷歌翻译
我们引入了基于高斯工艺回归和边缘化图内核(GPR-MGK)的探索性主动学习(AL)算法,以最低成本探索化学空间。使用高通量分子动力学模拟生成数据和图神经网络(GNN)以预测,我们为热力学性质预测构建了一个主动学习分子模拟框架。在特定的靶向251,728个烷烃分子中,由4至19个碳原子及其液体物理特性组成:密度,热能和汽化焓,我们使用AL算法选择最有用的分子来代表化学空间。计算和实验测试集的验证表明,只有313个(占总数的0.124 \%)分子足以训练用于计算测试集的$ \ rm r^2> 0.99 $的精确GNN模型和$ \ rm rm r^2>>实验测试集0.94 $。我们重点介绍了提出的AL算法的两个优点:与高通量数据生成和可靠的不确定性量化的兼容性。
translated by 谷歌翻译
无监督的生成的虚拟人类具有各种外观和动画姿势对于创建3D人体化身和其他AR/VR应用非常重要。现有方法要么仅限于刚性对象建模,要么不生成,因此无法合成高质量的虚拟人类并使它们进行动画化。在这项工作中,我们提出了Avatargen,这是第一种不仅可以具有不同外观的非刚性人类产生的方法,而且还可以完全控制姿势和观点,同时仅需要2D图像进行训练。具体而言,它通过利用粗糙的人体模型作为代理将观察空间扭曲到规范空间下的标准头像,将最近的3D甘斯扩展到了人类的衣服。为了建模非刚性动力学,它引入了一个变形网络,以学习规范空间中的姿势依赖性变形。为了提高生成的人类化身的几何质量,它利用签名距离字段作为几何表示,从而可以从几何学学习上的身体模型中进行更直接的正则化。从这些设计中受益,我们的方法可以生成具有高质量外观和几何形状建模的动画人体化身,从而极大地表现了先前的3D gan。此外,它有能力用于许多应用,例如单视重构造,复活和文本引导的合成。代码和预培训模型将可用。
translated by 谷歌翻译
多模式情感分析和抑郁估计是两个重要的研究主题,旨在使用多模式数据预测人类精神状态。先前的研究重点是制定有效的融合策略,以交换和整合不同模式的与思想有关的信息。一些基于MLP的技术最近在各种计算机视觉任务中取得了巨大的成功。受到这一点的启发,我们探索了本研究中具有混合视角的多模式方法。为此,我们介绍了完全基于MLP的多模式特征处理框架CubeMLP。 CUBEMLP由三个独立的MLP单元组成,每个单元都有两个仿射转换。 CUBEMLP接受所有相关的模态特征作为输入,并在三个轴上混合它们。使用CubeMLP提取特性后,将混合的多模式特征扁平以进行任务预测。我们的实验是在情感分析数据集上进行的:CMU-MOSI和CMU-MOSEI,以及抑郁估计数据集:AVEC2019。结果表明,CUBEMLP可以以低得多的计算成本来实现最先进的性能。
translated by 谷歌翻译